Kernel density estimation

A useful density estimation technique. A “kernel” is a probability density function for a single data point. For instance, if we assume that, when we observe a single data point at xx, the most likely underlying density distribution is N(x,σ2) \mathcal{N}(x,\sigma^2) , then this normal distribution is our kernel. Once we have a kernel function defined, we can simply accumulate them for all data points.

Another way to think about is starting from a histogram. When we plot a histogram, we have to choose the bins. One simple way to avoid this choice is putting a box (with the area 1/N1/N, where NN is the number of data points) centered at each data point. This is essentially a kernel density estimation with a rectangular kernel function.

When should we use KDE over histogram? I don’t think there’s a simple answer to this. As we can see in the construction of KDE from histogram, KDE resolves several issues with histogram (the sensitivity to the location of the bins and to the size of the bins) and it is in many cases a better inference method (esp. when using the Gaussian kernel). However, KDE can hide some important details (e.g., how sparse the data is) and it can mislead when the data is bounded. For instance, if data points are always non-negative and there are many data points at zero, the normal KDE will show probability distribution that stretches below 0, which is clearly inaccurate.

Boundary corrected kernel density estimation

Usages and tips

Python